Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Fold and Unfold #3167

Open
wants to merge 52 commits into
base: develop
Choose a base branch
from
Open

Implement Fold and Unfold #3167

wants to merge 52 commits into from

Conversation

DuongQLee
Copy link
Collaborator

@DuongQLee DuongQLee commented Jul 29, 2024

  • Added Fold and Unfold op.
  • Full benchmark result compared to ROCm Here
  • Average performance:

Op Dtype Direction Time
Unfold fp32 fwd 16.43641979
fp32 bwd 1.458798342
fp16 fwd 15.83955361
fp16 bwd 1.459763543
bfp16 fwd 15.90593279
bfp16 bwd 1.455323877
Fold fp32 fwd 1.463731927
fp32 bwd 28.09828887
fp16 fwd 1.479315364
fp16 bwd 28.08557933
bfp16 fwd 1.47158127
bfp16 bwd 26.84660993

@DuongQLee DuongQLee marked this pull request as ready for review August 2, 2024 06:52
@DuongQLee
Copy link
Collaborator Author

@iq136boy Can I have the error log for Jenkins - Fp32 Hip All gfx90a? Thank you.

Copy link
Contributor

@CAHEK7 CAHEK7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review.

Comment on lines +38 to +53
int N,
int C,
int H,
int W,
int P,
int L,
int LH,
int LW,
int kernel_size_h,
int kernel_size_w,
int stride_h,
int stride_w,
int padding_h,
int padding_w,
int dilation_h,
int dilation_w,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It limits the kernel applicability for tensors <2Gb

Comment on lines +128 to +131
else
{
GTEST_SKIP();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +42 to +48
size_t N;
size_t C;
size_t D;
size_t H;
size_t W;
std::vector<int32_t> kernelSize;
std::vector<int32_t> stride;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While dimensions are size_t, strides are 32bit integers. Can the be consistent?

[[maybe_unused]] const ExecutionContext& /*context*/,
[[maybe_unused]] const miopen::fold::FoldFwdProblemDescription& problem) const
{
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kernel does not support tensors >2Gb.

Comment on lines 47 to 48
[[maybe_unused]] const ExecutionContext& /*context*/,
[[maybe_unused]] const miopen::fold::FoldFwdProblemDescription& problem) const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either

    const ExecutionContext& /*context*/,

or

    [[maybe_unused]] const ExecutionContext& context,

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve in commit 918091d

{11, 13, 0, 17, 19, {3, 3}, {3, 2}, {0, 0}, {1, 1}, true},
{11, 13, 0, 17, 19, {3, 3}, {1, 1}, {3, 2}, {1, 1}, true},
{11, 13, 0, 17, 19, {3, 3}, {1, 1}, {0, 0}, {3, 2}, true},
{11, 13, 0, 33, 37, {4, 3}, {2, 3}, {5, 2}, {3, 5}, true},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like the algorithm rely on isContiguous and there is no non-isContiguous testcase to check it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added non contiguous test cases in commit 85c1ee0

Comment on lines +102 to +103
if(!isContiguous)
std::swap(inputDim.front(), inputDim.back());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why do you swap input dimensions for non-isContiguous case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, if we have tensor size {2, 4, 8, 16}, then contiguous strides is {512, 128, 16, 1}
If we swap the input strides and dim, then dim is {16, 4, 8, 2} and strides is {1, 16, 2, 64}
This will make the memory access non-contiguous but the size of the tensor still remains the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't it be just calculated in a reverse direction? Like {1, 2, 8, 64}. It is still the same total tensor size, but non-contiguous (i.e. transposed), but without any strange swapping.
Swapping looks suspicious.

Copy link
Collaborator Author

@DuongQLee DuongQLee Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made a pytorch implementations of this method: https://colab.research.google.com/drive/1TEPXclDXDQ5cLHsoPpQmrBmVUvuPKDLe?usp=drive_link
As you can see, by swapping the front back dim (tensor dim is {16, 4, 8, 2}), the memory strides make the tensor incontiguous.
image

test/gtest/unfold.hpp Outdated Show resolved Hide resolved
@long10024070
Copy link
Collaborator

@iq136boy Would you send us the build log of this PR?

@iq136boy
Copy link
Contributor

Please check the error log:
test_error_log.txt

Copy link
Contributor

@iq136boy iq136boy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BLocked by following CI failure. Need to restart after the CI failure is solved.

[2024-09-23T13:54:42.218Z] Exception occurred: org.kohsuke.github.HttpException: {"message":"API rate limit exceeded for user ID 49319081. If you reach out to GitHub Support for help, please include the request ID

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants